Data-center AI model on MacBook shows Apple may win AI race

Over the past few years, the artificial intelligence race looked like a story about infrastructure. Which company can build the biggest, most power-hungry data center, stock it with the most Nvidia GPUs and spend the most money? OpenAI, Amazon, Google, xAI — they’re all in a competition to build industrial-scale computing factories just to run the most powerful AI models. But it looks like developer Dan Woods just upended that story by running a data-center AI model on MacBook.

And that could mean Apple wins the AI race after all.

Developer runs data-center AI model on MacBook

Woods announced on X this week that he managed to get Qwen3.5-397B — a cutting-edge “frontier” AI model that normally requires a server rack full of specialized hardware — running on a 48GB MacBook Pro with an M3 Max chip. The model takes up 209GB (120GB when compressed) on disk, far exceeding what any laptop could hold in working memory. Yet Woods got it running at over 5.5 tokens per second. That’s quite a shocking accomplishment with a consumer laptop — especially one from a company with a reputation for bringing up the rear on AI development.

To understand why this is remarkable, some context helps. Frontier AI models — the class of models that powers ChatGPT, Claude and Gemini at their most capable — are typically enormous. Running them requires loading their billions of parameters into fast memory. A 48GB MacBook has nowhere near enough RAM to do that for a 209GB model.

So how did Woods pull it off?

The secret: Apple’s own research

The key was a 2023 research paper Apple quietly published called LLM in a Flash: Efficient Large Language Model Inference with Limited Memory. The paper tackles the challenge of running LLMs that exceed available memory by storing model parameters in flash storage and streaming them into RAM on demand — guided by an inference cost model that minimizes data transfer and reads data in larger, more efficient chunks.

In other words, Apple’s engineers had already figured out theoretically how to run huge AI models on devices with limited RAM. The technique takes advantage of the fact that modern Macs use fast NVMe SSD storage — and crucially, Apple silicon’s unified memory architecture. It lets the CPU, GPU and memory work in unusually tight coordination.

Woods combined what he learned from the paper with another insight. The Qwen model he chose is a “Mixture of Experts” (MoE) architecture. MoE models only activate a subset of their parameters for each token generated. That means the active weights can be streamed in from storage rather than all held in memory at once, according to developer Simon Willison, who wrote about Woods’ work. Woods dropped the number of active experts per token from 10 to 4. That compromise preserved most of the model’s quality while dramatically reducing memory demands.

He vibe-coded it with AI

Dan Woods AI chat — Woods also had some whimsical chats with AI.
Image: @danveloper on X.com

Here’s another twist that makes this story very 2026: Woods didn’t write all this low-level optimization code by hand. He fed Apple’s paper to Claude Code and used an autoresearch pattern to run 90 automated experiments, producing highly optimized MLX Objective-C and Metal code, the low-level graphics and compute language that runs directly on Apple silicon.

The result is open-source on GitHub, along with an AI-written technical paper describing the experiments in detail.

Data-center AI model on MacBook: Why it matters for Apple

The implications for Apple’s competitive position in AI are significant. The dominant narrative that Apple is behind — that Siri is a joke compared to ChatGPT, that Apple Intelligence is underwhelming, that the company missed the generative AI wave — could be misleading. Woods’s experiment suggests Apple may have quietly built the right hardware all along.

Apple silicon’s unified memory architecture lets CPU and GPU share the same high-bandwidth memory pool. And that looks like precisely the design needed for the flash-streaming technique Apple’s own researchers described. No other mainstream laptop platform has this. So MacBook Pro isn’t just a laptop that can run AI on the side. It may be the most capable personal AI computer on the market.

While competitors race to build billion-dollar data centers, the most powerful AI model you can run might soon be the one already sitting in your bag. Apple’s chip lead, combined with techniques like these, could make local AI on Mac — private, fast and free from cloud subscriptions — a genuine reality far sooner than anyone expected.

As Willison noted, the quality tradeoffs are still being evaluated. But we can’t overstate the breakthrough in simply getting it running. The AI race might not be won in a data center after all.

David Snow

David Snow, an expert on Apple hardware and software, writes on a variety of technological and cultural topics for Cult of Mac. They include Apple news, technology buying guides, and features about computer setups and Apple TV shows and movies.

With 30 years of experience covering technology and other subjects, he has written and edited for numerous print and online publications, including CMP Media, TechTV.com, CNET, Wired News, Red Herring magazine, Law.com, The National Law Journal and Law Technology News magazine. Among other roles, he served as executive editor of the Law.com network of websites and editorial director, technology, for ALM Media.

Snow graduated with a B.A. from Syracuse University with majors in magazine journalism and psychology. While there, he worked as a reporter for The Daily Orange newspaper and associate editor of Equal Time magazine.

Founder of the blog At the Waterline, he can be reached on X (formerly Twitter) via @atthewaterline and on Mastodon via @dsnow.

Dev runs data-center AI model on MacBook — and it changes everything

Developer runs data-center AI model on MacBook

The secret: Apple’s own research

He vibe-coded it with AI

Data-center AI model on MacBook: Why it matters for Apple

Comments Cancel reply

Subscribe to the Newsletter

Popular This Week

Don’t stop now — bring on a $399 Mac Neo

Why every Mac user should be thrilled MacBook Neo packs 8GB of RAM

Apple retires a legendary iPhone

AirPods Max 2 pack AI-powered listening and stronger noise cancellation

The MacBook Neo is the fantastic new Mac for the masses [Review]

Perplexity wants Mac mini to be your AI project manager — here’s what to know

The Studio Display XDR is the final boss of all Mac displays [Review]

Here’s when Apple will launch iOS 26.4

Top 15 setups where MacBook Pro runs the show

The iPhone 17e, now a solid budget phone with minimal compromise [Review]